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ABSTRACT 

The need for accurate and current scientific information in 
the fast paced Internet-aware world has prompted the scientific community to 
develop tools that reduce the scientist's time and effort to make digital 
information available to all interested parties. The availability of such 
tools has made the Internet a vast digital repository of information. But the 
ad hoc nature in which information is gathered and organized on the Web, 
makes access to such information a time consuming and sometimes frustrating 
affair. Digital library systems have the potential for solving problems in 
maintaining high quality scientific content delivered via the Web by 
providing tools for scientists to collect, verify, organize, manage, and 
update their collections. This paper describes an environment that reduces 
the effort and time required by scientists to share their data with other 
collaborators in an automated and asynchronous manner, thereby allowing them 
to focus mostly on their own scientific practice. The data is maintained as a 
collaborative collection in a digital library that can also be used as an 
educational resource. (Contains 22 references and 3 figures.) (Author) 
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Abstract: The need for accurate and current scientific information in the fast 
paced Internet-aware world has prompted the scientific community to develop 
' tools that reduce the scientist’s time and effort to make digital information 
available to all interested parties. The availability of such tools has made the 
Internet a vast digital repository of information. But the ad hoc nature in which 
information is gathered and organized on the web, makes access to such 
information a time consuming and sometimes frustrating affair. Digital library 
systems have the potential for solving problems in maintaining high quality 
scientific content delivered via the web by providing tools for scientists to 
' collect, verify, organize, manage, and update their collections. This paper 
describes an environment that reduces the effort and time required by scientists 
to share their data with other collaborators in an automated and asynchronous 
manner, thereby allowing them to focus mostly on their own scientific practice. 
The data is maintained as a collaborative collection in a digital library that can 
also be used as an educational resource. 
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1. Introduction 

Digital library systems have the potential for solving problems in maintaining high quality scientific 
content delivered via the web by providing tools for scientists to collect, verify, organize, manage, and 
update their collections. The ability of the Internet to bring geographically distant scientists and 
researchers together as a community provides the opportunity to build collaborative digital libraries that 
can provide a larger quantity and higher quality of data. The notions of ownership and attribution of 
intellectual and scientific property must be retained and the contributors must control sharing of their data. 

This paper describes an environment that reduces the effort and time required by scientists to share their 
data with other collaborators in an automated and asynchronous manner, thereby allowing them to focus 
mostly on their own scientific practice. The data is maintained as a collaborative collection in a digital 
library that can also be used as an educational resource [1]. The collection can be continually updated by 
the contributors and used or reused to build course content via tools provided by the digital library [2]. 

2, Research Context 

QQ 2.1 Digital Flora of Texas 

00 

The Digital Flora of Texas (DFT) [3] project was established as an open, web-based digital resource. 
Participants currently include scientists from 43 contributing herbaria as well as many researchers at public 




and private organizations [4]. Each participating herbarium (or collection) has its own curator or group of 
curators who verify and validate the data they contribute. The DFT project has a wide array of web tools, 
collections and prototype systems that can be used for research or education. The DFT contains information 
visualization tools for displaying statistical and distributional maps of taxa of the flora of Texas. In 
addition, upload and data conversion tools provide an automated way for researchers to easily build 
collections and share them with the scientific community. The DFT collections are accessible through the 
web, making it easier for researchers, scientists, instructors and students to access accurate and up to date 
information. Instructors can use the collections to construct web-based content for their courses. These 
educational and scientific resources can be used and reused in many different contexts [5,6,7]. 

2.2 Collections 

DFT collections are built either by a group of collaborating researchers or by individual researchers. In 
either case, the collections are publicly available and all scientists/researchers can participate in verifying 
and validating the result. Researchers, who are geographically distributed, collaborate using the various 
tools provided by the DFT project to build and maintain the following collections: 

• The Herbarium Specimen Browser is the Internet’s only multi-herbarium specimen data portal. It 
offers unique filtering, mapping and listing displays for a growing mass (more than 240,000 currently) 
of specimens taken from the participating Texas herbaria [8]. 

• The Image Gallery provides access to 8000 images of plants representing families found in Texas [9]. 

• The Contributor collection contains contact information for all contributors who generate digital 
content for this community-based collaborative digital library [10,1 1]. 

The following collections are a result of individual research: 

• The Bibliography (Wilson) provides over 3000 bibliographic references for the Texas Flora [12]. 

• The Checklist of the Vascular Plants of Texas (Hatch, et al) refers to 180 families, -1300 genera, and 
-5000 species [13]. 

• The Centex Flora (Reed) is a manual of the dicot flora of Brazos and surrounding counties that 
provides keys and descriptions for 104 families, 489 genera and 1 104 species [14,15]. 

• The Texas Grasses (Hatch and Dawson) has keys, descriptions and images for 142 genera and -550 
species in the grass family [16]. 

3. Design of a Collaborative Digital Library 

3.1 Model of Collaboration/Interaction with System Components 

In the design of a scientific collaborative digital library, it is of utmost importance for the contributor to 
maintain ownership of his/her scientific and intellectual property. The notion of ownership and control is 
maintained in the DFT because the contributors decide when to share their data and what part of their data 
to share. Automation is used only under control of the contributor and is a two step process: 1) uploading 
data to share and 2) incorporating data into the public collection. The sharing occurs at different times and 
from different geographic locations. 

3.2 Rationale for Choice of Tools 

The DFT is using the previously defined Flora of Texas Consortium (FTC) data model [17] as the common 
data format for its collections, extensible Markup Language (XML) [18] is ideal for the FTC data model 
since it renders a strong sense of structure to data and can be validated and maintained with little or no 
programming. A wide variety of Application Programming Interfaces (APIs) are also available that can 
convert a wide range of data formats to XML. 

Greenstone is an open source digital library system that automatically provides multiple, user-defined 
browsing and searching indexes on the collections it maintains. It provides plugins for converting many 
standard format files (e.g. Word, PDF) to XML and allows rebuilding of collections while they are in use 
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(no down time). It also provides a standard, yet customizable user interface and compresses its documents, 
thereby saving space and, more importantly, mitigating the effect of long XML tags. 

3.3 System Architecture 

Contributors can upload their data and build or rebuild a collection in the DFT at any time. The contributors 
may use any program they wish to maintain their personal collections (e.g. Word, Excel, Access, Filemaker 
Pro). When they are ready to upload their data, they dump it in a flat ASCII file (an operation that is 
standard with all data management tools). The upload system converts the data into XML and validates it 
against the FTC data model. Errors, if any, are reported to the contributors. Records without errors are 
uploaded and are ready to be added to the collection. The uploaded data is not shared with the other 
collaborators until the contributor explicitly builds or rebuilds a collection of this validated data, making it 
available for public use. Other collaborators as well as non-specialists with an interest in the domain can 
access the collection using the common web-based user interface provided by Greenstone or by 
customizing their own user interface. This process relieves the scientists of the tasks of converting data 
into various acceptable formats enabling them to channel more time and effort toward practicing their 
science (Figure 1 ). 

The Herbarium Specimen Browser, Image Gallery and Contributor collections mentioned in section 2 are 
collections that are built and maintained by geographically distributed contributors and follow the same 
methodology as described above. TO upload his/her data, the contributor uses the web-based upload system 
[19] to specify the number and order of fields, field separators and field enclosures. Figures 2 and 3 show 
the upload form and an example search results page for the Image Gallery collection. 



4. Lessons Learned 

4.1 Scientific Practice 

“The products of Systematic Botany, previously generated locally as static, hardcopy documents, can now 
be presented as collaborative enterprises from distributed centers as high-content, dynamic data resources 
that are constantly updated and refined.” [1] The system architecture of the DFT enhances the sharing and 
browsing aspects of collaboration by providing a web-based input system that converts the uploaded data to 
standard XML format. This data is then added to the shared collection under control of the contributor. 
Thus, scientists can follow this process without disrupting their normal working style. This automated 
process shortens the time to share scientific information and the notion of ownership of intellectual and 
scientific property is maintained, as the scientists control their own collections and share whenever they are 
ready. The upload system also filters the data and performs various error checks (e.g. typographical error 
checks) in order to verify the data. Scientists appreciate the fact that their data is verified by the input 
system and by the information visualization tools (e.g. the mapping system) provided to browse through the 
digital library. 

4.2 Education 

We have observed that collaborators use and reuse either their own information content or that of the other 
collaborators in the digital library to build new course structures and educational packages. Tools provided 
with the digital library expedite the process of reuse and construction of new packages. Students are also 
presented with a larger and more diverse collection and are not restricted by the limitations of a localized 
collection. In most cases, we observed that collaborators found it useful to have cross collection 
querying/browsing, wherein the results of a search on one collection are used to search an entirely different 
collection. This exploratory learning methodology helps students find information they are looking for 
while following trails that interest them the most. Greenstone supports this cross collection 
querying/browsing model, enabling the user to seamlessly move from one collection to another. 



4.3 Digital Libraries 



The immediate benefit of having a collaborative digital library is the continuous updating and refinement of 
the information content and accuracy of information provided by the collaborative interactions of the 
scientists. Since it is fairly easy to maintain and browse the digital library, we find more and more 
collections being built using the system we have described. For example, the Image Gallery collection was 
developed because the scientists involved find the model of collaboration easy to use. Collections do 
indeed have their own unique identities and are a scientifically valid resource, but they also tend to become 
integrated in the notion of a digital library that allows cross collection querying/browsing. 
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Figure 1. Architecture for DFT Collaborative Digital Library Collections. 
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5. Future Work 



Currently, the system does not have collaborative tools to support a peer review process by a group of 
curators [20]. With this set of tools, any person with botanical interests and a computer connected to the 
Internet could contribute to the digital library. The curators would be responsible for accepting or rejecting 
contributions and for maintaining high quality content. Considering the ease of creating and maintaining 
collections, we would like to put more content in the digital library in the form of a monthly journal, a 
collection of thesis and dissertations, and a collection of videos of lectures and field trips; thus adding 
various media formats to the digital collection and enhancing the value of the collections. Currently, 
collaborating scientists and university students are the main beneficiaries of the DFT. It would also be 
interesting to see how well the system scales to the needs of primary and secondary students. 
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